gusucode.com > 《MATLAB图像与视频处理实用案例详解》代码 > 《MATLAB图像与视频处理实用案例详解》代码/第 19 章 基于语音识别的信号灯图像模拟控制技术/voicebox/dypsa.m
function [gci,goi] = dypsa(s,fs) %DYPSA Derive glottal closure instances from speech [gci,goi] = (s,fs) % Note: Needs to be combined with a voiced-voiceless detector to eliminate % spurious closures in unvoiced and silent regions. % % Inputs: % s is the speech signal % fs is the sampling frequncy % % Outputs: % gci is a vector of glottal closure sample numbers % gco is a vector of glottal opening sample numbers derived from % an assumed constant closed-phase fraction % % References: % [1] P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, 揈stimation of Glottal Closure % Instants in Voiced Speech using the DYPSA Algorithm, IEEE Trans on Speech and Audio % Processing, vol. 15, pp. 34?3, Jan. 2007. % [2] M. Brookes, P. A. Naylor, and J. Gudnason, 揂 Quantitative Assessment of Group Delay Methods % for Identifying Glottal Closures in Voiced Speech, IEEE Trans on Speech & Audio Processing, % vol. 14, no. 2, pp. 456?66, Mar. 2006. % [3] A. Kounoudes, P. A. Naylor, and M. Brookes, 揟he DYPSA algorithm for estimation of glottal % closure instants in voiced speech, in Proc ICASSP 2002, vol. 1, Orlando, 2002, pp. 349?52. % [4] C. Ma, Y. Kamp, and L. F. Willems, 揂 Frobenius norm approach to glottal closure detection % from the speech signal, IEEE Trans. Speech Audio Processing, vol. 2, pp. 258?65, Apr. 1994. % [5] A. Kounoudes, 揈poch Estimation for Closed-Phase Analysis of Speech, PhD Thesis, % Imperial College, 2001. % Algorithm Parameters % The following parameters are defined in voicebox() % % dy_cpfrac=0.3; % presumed closed phase fraction of larynx cycle % dy_cproj=0.2; % cost of projected candidate % dy_cspurt=-0.45; % cost of a talkspurt % dy_dopsp=1; % Use phase slope projection (1) or not (0)? % dy_ewdly=0.0008; % window delay for energy cost function term [~ energy peak delay from closure] (sec) % dy_ewlen=0.003; % window length for energy cost function term (sec) % dy_ewtaper=0.001; % taper length for energy cost function window (sec) % dy_fwlen=0.00045; % window length used to smooth group delay (sec) % dy_fxmax=500; % max larynx frequency (Hz) % dy_fxmin=50; % min larynx frequency (Hz) % dy_fxminf=60; % min larynx frequency (Hz) [used for Frobenius norm only] % dy_gwlen=0.0030; % group delay evaluation window length (sec) % dy_lpcdur=0.020; % lpc analysis frame length (sec) % dy_lpcn=2; % lpc additional poles % dy_lpcnf=0.001; % lpc poles per Hz (1/Hz) % dy_lpcstep=0.010; % lpc analysis step (sec) % dy_nbest=5; % Number of NBest paths to keep % dy_preemph=50; % pre-emphasis filter frequency (Hz) (to avoid preemphasis, make this very large) % dy_spitch=0.2; % scale factor for pitch deviation cost % dy_wener=0.3; % DP energy weighting % dy_wpitch=0.5; % DP pitch weighting % dy_wslope=0.1; % DP group delay slope weighting % dy_wxcorr=0.8; % DP cross correlation weighting % dy_xwlen=0.01; % cross-correlation length for waveform similarity (sec) % Revision History: % % 3.0 - 29 Jun 2006 - Rewrote DP function to improve speed % 2.6 - 29 Jun 2006 - Tidied up algorithm parameters % 2.4 - 10 Jun 2006 - Made into a single file aand put into VOICEBOX % 2.3 - 18 Mar 2005 - Removed 4kHz filtering of phase-slope function % 2.2 - 05 Oct 2004 - dpgci uses the slopes returned from xewgrdel % - gdwav from speech with fs<9000 is not filtered % - Various outputs and inputs of functions have been % removed since now there is no plotting % 1.0 - 30 Jan 2001 - Initial version [5] % Bugs: % 1. Allow the projections only to extend to the end of the larynx cycle % 2. Compensate for false pitch period cost at the start of a voicespurt % 3. Should include energy and pahse-slope costs for the first closeure of a voicespurt % 4. should delete candidates that are too close to the beginning or end of speech for the cost measures % currently this is 200 samples fixed in the main routine but it should adapt to window lengths of % cross-correlation, lpc and energy measures. % 5. Should have an integrated voiced/voiceless detector % 6. Allow dypsa to be called in chunks for a long speech file % 7. Do forward & backward passes to allow for gradient descent and/or discriminative training % 8. Glottal opening approximation does not really belong in this function % 9. The cross correlation window is asymmetric (and overcomplex) for no very good reason % 10. Incorporate -0.5 factor into dy_wxcorr and abolish erroneous (nx2-1)/(nx2-2) factor % 11. Add in talkspurt cost at the beginning rather than the end of a spurt (more efficient) % 12. Remove qmin>2 condition from voicespurt start detection (DYPSA 2 compatibility) in two places % 13. Include energy and phase-slope costs at the start of a voicespurt % 14. Single-closure voicespurt are only allowed if nbest=1 (should always be forbidden) % 15. Penultimate closure candidate is always acceptd % 16. Final element of gcic, Cfn and Ch is unused % 17. Needs to cope better with irregular voicing (e.g. creaky voice) % 18. Should give non-integer GCI positions for improved accuracy % 19. Remove constraint that first voicespurt cannot begin until qrmax after the first candidate % Copyright (C) Tasos Kounoudes, Jon Gudnason, Patrick Naylor and Mike Brookes 2006 % Version: $Id: dypsa.m,v 3.6 2007/05/04 07:01:38 dmb Exp $ % % VOICEBOX is a MATLAB toolbox for speech processing. % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This program is free software; you can redistribute it and/or modify % it under the terms of the GNU General Public License as published by % the Free Software Foundation; either version 2 of the License, or % (at your option) any later version. % % This program is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the % GNU General Public License for more details. % % You can obtain a copy of the GNU General Public License from % http://www.gnu.org/copyleft/gpl.html or by writing to % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Extract algorithm constants from VOICEBOX dy_preemph=voicebox('dy_preemph'); dy_lpcstep=voicebox('dy_lpcstep'); dy_lpcdur=voicebox('dy_lpcdur'); dy_dopsp=voicebox('dy_dopsp'); % Use phase slope projection (1) or not (0)? dy_ewtaper=voicebox('dy_ewtaper'); % Prediction order of FrobNorm method in seconds dy_ewlen=voicebox('dy_ewlen'); % windowlength of FrobNorm method in seconds dy_ewdly=voicebox('dy_ewdly'); % shift for assymetric speech shape at start of voiced cycle dy_cpfrac=voicebox('dy_cpfrac'); % presumed ratio of larynx cycle that is closed dy_lpcnf=voicebox('dy_lpcnf'); % lpc poles per Hz (1/Hz) dy_lpcn=voicebox('dy_lpcn'); % lpc additional poles lpcord=ceil(fs*dy_lpcnf+dy_lpcn); % lpc poles %PreEmphasise input speech s_used=filter([1 -exp(-2*pi*dy_preemph/fs)],1,s); % perform LPC analysis, AC method with Hamming windowing [ar, e, k] = lpcauto(s_used,lpcord,floor([dy_lpcstep dy_lpcdur]*fs)); if any(any(isinf(ar))) % if the data is bad and gives infinite prediction coefficients we return with a warning warning('No GCIs returned'); gci=[]; return; end; % compute the prediction residual r = lpcifilt(s_used,ar,k); % compute the group delay function: EW method from reference [2] above [zcr_cand,sew,gdwav,toff]=xewgrdel(r,fs); gdwav=-[zeros(toff,1); gdwav(1:end-toff)]; zcr_cand=[round(zcr_cand), ones(size(zcr_cand))]; %flag zero crossing candidates with ones sew=0.5+sew'; %the phase slope cost of each candidate pro_cand=[]; if dy_dopsp ~= 0 pro_cand = psp(gdwav,fs); pro_cand = [pro_cand, zeros(length(pro_cand),1)]; %flag projected candidates with zeros sew = [sew zeros(1,size(pro_cand,1))]; %the phase slope cost of a projected candidate is zero end; %Sort the zero crossing and projected candidates together and remove any candidates that %are within 200 samples from the speech boundary [gcic,sin] = sortrows([zcr_cand; pro_cand],1); sew=sew(sin); sin=find(and(200<gcic,gcic<length(gdwav)-200)); gcic=gcic(sin,:); sew=sew(sin); % compute the frobenious norm function used for a cost in the DP fnwav=frobfun(s_used,dy_ewtaper*fs,dy_ewlen*fs,dy_ewdly*fs); %Dynamic programming, picks the most likely candidates based on the pitch consistency, energy etc. [gci] = dpgci(gcic, s_used(:), sew, fnwav, fs); %Evaluate goi ... determined to be dy_cpfrac percentage of the larynx cylce away from the last gci goi=simplegci2goi(gci,dy_cpfrac); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function z = psp(g,fs) %PSP Calculates the phase slope projections of the group delay function % Z = PSP(G) computes the % Revision History: % V1.0 July 12th 2002: % Nov. 6th 2002 : if statement added to remove "last" midpoint g = g(:); gdot = [diff(g);0]; gdotdot = [diff(gdot);0]; % find the turning points as follows: [tp_number, index_of_tp, min(1) or max(-1), g(index_of_tp)] turningPoints = zcr(gdot); turningPoints = [[1:length(turningPoints)]', turningPoints, sign(gdotdot(turningPoints)), g(turningPoints)]; % useful for debug/plotting %tplot = zeros(length(g),1); %tplot(turningPoints(:,1)) = turningPoints(:,2); % find any maxima which are < 0 negmaxima = turningPoints(find(turningPoints(:,3) == -1 & turningPoints(:,4) < 0 & turningPoints(:,1)~=1),:); %Change 01.05.2003 JG: The first row can't be included % find the midpoint between the preceding min and the negative max nmi = negmaxima(:,1); midPointIndex = turningPoints(nmi-1,2) + round(0.5*(turningPoints(nmi,2) - turningPoints(nmi-1,2))); midPointValue = g(midPointIndex); % project a zero crossing with unit slope nz = midPointIndex - round(midPointValue); % find any minima which are > 0 posminima = turningPoints(find(turningPoints(:,3) == 1 & turningPoints(:,4) > 0),:); % find the midpoint between the positive min and the following max pmi = posminima(:,1); %Remove last midpoint if it is the last sample if ~isempty(pmi), if pmi(end)==size(turningPoints,1), pmi=pmi(1:end-1); end; end; midPointIndex = turningPoints(pmi,2) + round(0.5*(turningPoints(pmi+1,2) - turningPoints(pmi,2))); midPointValue = g(midPointIndex); % project a zero crossing with unit slope pz = midPointIndex - round(midPointValue); z = sort([nz;pz]); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function i = zcr(x, p) %ZCR Finds the indices in a vector to zero crossings % I = ZCR(X) finds the indices of vector X which are closest to zero-crossings. % I = ZCR(X, P) finds indices for positive-going zeros-crossings for P=1 and % negative-going zero-crossings for P=0. x = x(:); if (nargin==2) if (p==0) z1 = zcrp(x); % find positive going zero-crossings elseif (p==1) z1 = zcrp(-x); % find negative going zero-crossings else error('ZCR: invalid input parameter 2: must be 0 or 1'); end else z1 = [zcrp(x); zcrp(-x)]; end % find crossings when x==0 exactly z0 = find( (x(1:length(x))==0) & ([x(2:length(x));0] ~= 0)); % concatenate and sort the two types of zero-crossings i = sort([z0; z1]); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function zz = zcrp(xx) %only used in zcr % find positive-going zero-crossing z1 = find(diff(sign(xx)) == -2); % find which out of current sample or next sample is closer to zero [m, z2] = min([abs(xx(z1)), abs(xx(z1+1))], [], 2); zz = z1 -1 + z2; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [frob]=frobfun(sp,p,m,offset) % [frob]=frobfun(sp,p,m) % % sp is the speech signal assumed to be preemphasised % p is the prediction order : recomended to be 1 ms in above paper % m is the window length : recomended to be 1 ms in above paper % offset is shift for assymetric speech shape at start of voiced cycle - % default 1.5ms. % % This function implements the frobenius norm based measure C defined in [4] below. % It equals the square of the Frobenius norm of the m by p+1 data matrix divided by p+1 % % Reference: % [4] C. Ma, Y. Kamp, and L. F. Willems, 揂 Frobenius norm approach to glottal closure detection % from the speech signal, IEEE Trans. Speech Audio Processing, vol. 2, pp. 258?65, Apr. 1994. % Revision History: % V1.0 July 12th 2002: % Nov. 6th 2002 : if statement added to remove "last" midpoint %force p m and offset to be integers p=round(p); m=round(m); offset=round(offset); w=(p+1)*ones(1,m+p); w(1:p)=1:p; w(m+1:p+m)=p:-1:1; w=w./(p+1); frob=filter(w,1,sp.^2); frob(1:(round((p+m-1)/2) + offset))=[]; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function goi=simplegci2goi(gci,pr) % Estimate glottal opening instants by assuming a fixed closed-phase fraction gci=round(gci); maxpitch=max(medfilt1(diff(gci),7)); % calculate opening instants for kg=1:length(gci)-1 goi(kg)=gci(kg)+min(pr*(gci(kg+1)-gci(kg)),pr*maxpitch); end; kg=kg+1; goi(kg)=round(gci(kg)+pr*(gci(kg)-gci(kg-1))); %use the previous pitch period instead goi=round(goi); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [tew,sew,y,toff]=xewgrdel(u,fs) % implement EW group delay epoch extraction dy_gwlen=voicebox('dy_gwlen'); % group delay evaluation window length dy_fwlen=voicebox('dy_fwlen'); % window length used to smooth group delay % perform group delay calculation gw=2*floor(dy_gwlen*fs/2)+1; % force window length to be odd ghw=window('hamming',gw,'s'); ghw = ghw(:); % force to be a column (dmb thinks window gives a row - and he should know as he wrote it!) ghwn=ghw'.*(gw-1:-2:1-gw)/2; % weighted window: zero in middle u2=u.^2; yn=filter(ghwn,1,u2); yd=filter(ghw,1,u2); yd(abs(yd)<eps)=10*eps; % prevent infinities y=yn(gw:end)./yd(gw:end); % delete filter startup transient toff=(gw-1)/2; fw=2*floor(dy_fwlen*fs/2)+1; % force window length to be odd if fw>1 daw=window('hamming',fw,'s'); y=filter(daw,1,y)/sum(daw); % low pass filter toff=toff-(fw-1)/2; end [tew,sew]=zerocros(y,'n'); % find zero crossings tew=tew+toff; % compensate for filter delay and frame advance %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function Cfn=fnrg(gcic,frob,fs) %Frobenious Energy Cost dy_fxminf=voicebox('dy_fxminf'); frob=frob(:)'; mm=round(fs/dy_fxminf); mfrob=maxfilt(frob,1,mm); mfrob=[mfrob(floor(mm/2)+1:end) max(frob(end-ceil(mm/2):end))*ones(1,floor(mm/2))]; rfr=frob./mfrob; Cfn=0.5-rfr(round(gcic)); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function gci=dpgci(gcic, s, Ch, fnwav, fs) %DPGCI Choose the best Glottal Closure Instances with Dynamic Programming % gci=dpgci(gcic, s(:), Ch, fnwav, fs) returns vectors of sample indices corresponding % to the instants of glottal closure in the speech signal s at sampling frequency fs Hz. % % Inputs: % gcic is a matrix whos first column are the glottal closure instance candidates and % the second column is 1 if the corresponding gci is derived from a zero crossing % but zero if the gci is from a a projected zero crossing % s is the speech signal - MUST be a column vector % Ch the phase slope cost of every candidate % fnwav is the frobenious norm function of s % fs is the sampling frequncy % % Outputs: % gci is a vector of glottal closure instances chosen by the DP % Revision History: % Bugs: Constants are hardwired but defined in a structure like pv (defined in grpdelpv) % % get algorithm parameters from voicebox() dy_fxmin=voicebox('dy_fxmin'); % min larynx frequency (Hz) dy_fxmax=voicebox('dy_fxmax'); % min larynx frequency (Hz) dy_xwlen=voicebox('dy_xwlen'); % cross-correlation length for waveform similarity (sec) dy_nbest=voicebox('dy_nbest'); % Number of NBest paths to keep dy_spitch=voicebox('dy_spitch'); % scale factor for pitch deviation cost wproj=voicebox('dy_cproj'); % cost of projected candidate dy_cspurt=voicebox('dy_cspurt'); % cost of a talkspurt dy_wpitch=voicebox('dy_wpitch'); % DP pitch weighting dy_wener=voicebox('dy_wener'); % DP energy weighting dy_wslope=voicebox('dy_wslope'); % DP group delay slope weighting dy_wxcorr=voicebox('dy_wxcorr'); % DP cross correlation weighting %Constants Ncand=length(gcic); sv2i=-(2*dy_spitch^2)^(-1); % scale factor for pitch deviation cost %Limit the search: qrmin=ceil(fs/dy_fxmax); qrmax=floor(fs/dy_fxmin); %Cost and tracking r = current, q = previous, p = preprevious cost=zeros(Ncand, dy_nbest); cost(:,:)=inf; %Cost matrix, one row for each candidate maxcost=zeros(Ncand,1); maxcost(:,:)=inf; %Maximum cost in each row imaxcost=ones(Ncand,1); %Index of maximum cost prev = ones(Ncand, dy_nbest); %index of previous, q candidates ind = ones(Ncand, dy_nbest); %index of p in row q (from prev) qbest = [zeros(Ncand,1), ones(Ncand,2)]; % the minimum cost in any previous q [cost,q,i] Cfn=fnrg(gcic(:,1),fnwav,fs); %Frob.Energy Cost %Add start and end state % === should probably delete candidates that are too close to either end of the input % === why do we ever need the additional one at the tail end ? gcic=[[gcic(1,1)-qrmax-2 0];gcic;[gcic(end,1)+qrmax+2 0]]; Cfn=[0 Cfn 0]; Ch = [0 Ch 0]; % first do parallelized version nxc=ceil(dy_xwlen*fs); % cross correlation window length in samples % === should delete any gci's that are within this of the end. % === and for energy window % rather complicated window specification is for compatibility with DYPSA 2 % === +1 below is for compatibility - probably a bug wavix=(-floor(nxc/2):floor(nxc/2)+1)'; % indexes for segments [nx2,1] nx2=length(wavix); sqnx2=sqrt(nx2); g_cr=dy_wener*Cfn+dy_wslope*Ch+wproj*(1-gcic(:,2))'; % fixed costs g_n=gcic(:,1)'; % gci sample number [1,Ncand+2] g_pr=gcic(:,2)'; % unprojected flag [1,Ncand+2] g_sqm=zeros(1,Ncand+1); % stores: sqrt(nx2) * mean for speech similarity waveform g_sd=zeros(1,Ncand+1); % stores: 1/(Std deviation * sqrt(nx2)) for speech similarity waveform f_pq=zeros((Ncand+1)*dy_nbest,1); % (q-p) period for each node f_c=repmat(Inf,(Ncand+1)*dy_nbest,1); % cumulative cost for each node - initialise to inf f_c(1)=0; % initial cost of zero for starting node % f_costs=zeros(Ncand*dy_nbest,6); % === debugging only remember costs of candidate f_f=ones((Ncand+1)*dy_nbest,1); % previous node in path f_fb=ones((Ncand+1),1); % points back to best end-of-spurt node fbestc=0; % cost of best end-of-spurt node qmin=2; for r=2:Ncand+1 % if r==86 % r; % end r_n=g_n(r); % sample number of r = current candidate rix=dy_nbest*(r-1)+(1:dy_nbest); % index range within node variables % determine the range of feasible q candidates qmin0=qmin; qmin=find(g_n(qmin0-1:r-1)<r_n-qrmax); % qmin is the nearest candidate that is >qrmax away qmin=qmin(end)+qmin0-1; % convert to absolute index of first viable candidate qmax=find(g_n(qmin-1:r-1)<=r_n-qrmin); % qmax is the nearest candidate that is >=qrmin away qmax=qmax(end)+qmin-2; % calculate waveform similarity cost measure statistics sr=s(r_n+wavix); % note s MUST be a column vector so sr is also wsum=sum(sr); g_sqm(r)=wsum/sqnx2; % mean * sqrt(nx2) g_sd(r)=1/sqrt(sr.'*sr-wsum^2/nx2); % 1/(Std deviation * sqrt(nx2)) % now process the candidates if qmin<=qmax qix=qmin:qmax; % q index nq=length(qix); % === should integrate the -0.5 into dy_wxcorr % === the factor (nx2-1)/(nx2-2) is to compensate for a bug in swsc() q_cas=-0.5*(nx2-1)/(nx2-2)*dy_wxcorr*(sum(s(repmat(g_n(qix),nx2,1)+repmat(wavix,1,nq)).*repmat(sr,1,nq),1)-g_sqm(qix)*g_sqm(r)).*g_sd(qix)*g_sd(r); % compare: i=35; Ca=swsc(g_n(qix(i)),g_n(r),s,fs); [i qix(i) r g_n(qix(i)) g_n(r) dy_wxcorr*Ca q_cas(i)] % now calculate pitch deviation cost fix = 1+(qmin-1)*dy_nbest:qmax*dy_nbest; % node index range f_qr=repmat(r_n-g_n(qix),dy_nbest,1); % (r-p) period for each node f_pr=f_qr(:)+f_pq(fix); % === could absorb the 2 into sv2i f_nx=2-2*f_pr./(f_pr+abs(f_qr(:)-f_pq(fix))); f_cp=dy_wpitch*(0.5-exp(sv2i*f_nx.^2)); % === fudge to match dypsa2.4 - could more efficiently be added % === onto the cost of a talkspurt end % === should be a voicebox parameter anyway f_cp(f_pq(fix)==0)=dy_cspurt*dy_wpitch; % now find the N-best paths [r_cnb,nbix]=sort(f_c(fix)+f_cp+reshape(repmat(q_cas,dy_nbest,1),nq*dy_nbest,1)); f_c(rix)=r_cnb(1:dy_nbest)+g_cr(r); % costs f_f(rix)=nbix(1:dy_nbest)+(qmin-1)*dy_nbest; % traceback nodes f_pq(rix)=f_qr(nbix(1:dy_nbest)); % previous period % === f_costs is only for debugging % r; % f_costs(rix,1)=f_c(fix(nbix(1:dy_nbest))); % f_costs(rix,2)=wproj*(1-gcic(r,2)); % f_costs(rix,3)=f_cp(nbix(1:dy_nbest)); % f_costs(rix,4)=dy_wener*Cfn(r); % f_costs(rix,5)=dy_wslope*Ch(r); % f_costs(rix,6)=reshape(q_cas(1+floor((nbix(1:dy_nbest)-1)/dy_nbest)),dy_nbest,1); % check cost of using this candidate as the start of a new spurt % ==== the qmin>2 condition is for compatibility with dypsa 2 and % prevents any spurts starting until at least qrmax past the first % gci. This is probably a bug (see again below) iNb=rix(end); if (qmin>2) && (f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2))<f_c(iNb)) % compare with worst of Nbest paths f_f(iNb)=f_fb(qmin-1); % === for now we exclude the energy and phase-slope costs for compatibility with dypsa2 % === this is probably a bug f_c(iNb)=f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2)); % replace worst of the costs with start voicespurt cost f_pq(iNb)=0; % false pq period end if f_c(rix(1))<fbestc f_fb(r)=rix(1); % points to the node with lowest end-of-spurt cost % === should compensate for the pitch period cost incurred at the start of the next spurt % === note that a node can never be a one-node voicespurt on its own unless dy_nbest=1 % since the start voices[purt option replaced the worst Nbest cost. This is probably good but % is a bit inconsistent. fbestc=f_c(rix(1)); else f_fb(r)=f_fb(r-1); end else % no viable candidates - must be the start of a voicespurt if anything % === for now we exclude the energy and phase-slope costs for compatibility with dypsa2 % === this is probably a bug % ==== the qmin>2 condition is for compatibility with dypsa 2 and % prevents any spurts starting until at least qrmax past the first % gci. This is probably a bug (see again above) if (qmin>2) f_c(rix(1))=f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2)); % cost of new voicespurt f_f(rix)=f_fb(qmin-1); % traceback to previous talkspurt end f_pq(rix)=0; % previous period end f_fb(r)=f_fb(r-1); % cannot be the end of a voicespurt end end % now do the traceback gci = zeros(1,Ncand+1); % === for compatibility with dypsa2, we force the penultimate candidate to be accepted % === should be: i=f_fb(Ncand+1) but instead we pick the best of the penultimate candidate i=rix(1)-dy_nbest; if f_c(i-dy_nbest+1)<f_c(i) % check if start of a talkspurt i=i-dy_nbest+1; end k=1; while i>1 j=1+floor((i-1)/dy_nbest); % convert node number to candidate number gci(k)=g_n(j); i=f_f(i); k=k+1; end gci=gci(k-1:-1:1); % put into ascending order %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%